Skip to content

fix(query): Remove duplicated rows when datasets shared Publicly in the hub page#6016

Open
Mrudhulraj wants to merge 11 commits into
apache:mainfrom
Mrudhulraj:fix/dataset-fetch-query-fix
Open

fix(query): Remove duplicated rows when datasets shared Publicly in the hub page#6016
Mrudhulraj wants to merge 11 commits into
apache:mainfrom
Mrudhulraj:fix/dataset-fetch-query-fix

Conversation

@Mrudhulraj

@Mrudhulraj Mrudhulraj commented Jun 29, 2026

Copy link
Copy Markdown

What changes were proposed in this PR?

Issue - Duplicate datasets on hub landing page / hub search

Symptom: A user creates a dataset, makes it public, and grants another user explicit access. When the grantee browses the hub, the dataset appears twice in the search results.

Root cause:

DatasetSearchQueryBuilder.constructFromClause produced this SQL:

path: amber\src\main\scala\org\apache\texera\web\resource\dashboard\DatasetSearchQueryBuilder.scala:72

  SELECT DISTINCT ...
  FROM dataset
  LEFT JOIN dataset_user_access ON dua.did = dataset.did
  LEFT JOIN "user" ON ...
  WHERE (dua.uid = <ME>) OR (dataset.is_public = true)

For a dataset that is both public AND explicitly shared with the user, the LEFT JOIN produces one row per matching dataset_user_access row and the OR makes both branches true.
This applies similarly to worflows too.

Fix 1 — DatasetSearchQueryBuilder.constructFromClause

Move the UID filter from the WHERE clause into the JOIN's ON clause so each dataset produces at most one joined row, and force the JOIN to FALSE when uid == null so the SELECT still references a valid
table.

  val baseJoin = DATASET
    .leftJoin(DATASET_USER_ACCESS)
    .on(DATASET_USER_ACCESS.DID.eq(DATASET.DID))
    .**and**(if (uid == null) DSL.**falseCondition**() else DATASET_USER_ACCESS.UID.eq(uid))
    .leftJoin(USER)
    .on(USER.UID.eq(DATASET.OWNER_UID))

  val condition: Condition =
    if (uid == null) {
      DATASET.IS_PUBLIC.eq(true)
    } else if (includePublic) {
      DATASET.IS_PUBLIC.eq(true).or(DATASET_USER_ACCESS.UID.isNotNull)
    } else {
      DATASET_USER_ACCESS.UID.isNotNull
    }

  baseJoin.where(condition)

Why AND FALSE for uid == null?
The SELECT references DATASET_USER_ACCESS.PRIVILEGE. Without dataset_user_access in the FROM, DB throws missing
FROM-clause entry for table "dataset_user_access". AND FALSE keeps the table in the FROM while making the JOIN yield NULL access columns — which is the correct semantic for "no explicit grant".

Behavior matrix:

uid includePublic Matched datasets
null (n/a) Public only
not null false Datasets with explicit access of logged-in user only
not null true Public + logged-in explicit access (no duplicates)

Any related issues, documentation, discussions?

Refs #5957

How was this PR tested?

Tested manually with database checks and UI workflow testing.

Was this PR authored or co-authored using generative AI tooling?

No AI tools were used in the process.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Thanks for opening this pull request, @Mrudhulraj!

It looks like the pull request description doesn't quite follow our template yet:

  • The What changes were proposed in this PR? section is empty; please fill it in.

Filling out the template helps reviewers understand and triage your contribution faster. Please edit the description to complete it. This message will disappear automatically once the template is followed.

You can find the template prompts by editing the description, or see CONTRIBUTING.md for the full contribution flow.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @xuang7
    You can notify them by mentioning @xuang7 in a comment.

@Mrudhulraj Mrudhulraj changed the title fix(query): Remove duplicated rows when dataset/workflows shared Publicly in the hub page fix(query): Remove duplicated rows when datasets shared Publicly in the hub page Jun 29, 2026
@carloea2

Copy link
Copy Markdown
Contributor

Add some specs to both PRs

@carloea2

Copy link
Copy Markdown
Contributor

Thanks, this is much more focused now.

The core fix looks right to me: filtering DATASET_USER_ACCESS by the current uid in the JOIN should prevent the public-dataset join fan-out, since dataset_user_access is keyed by (did, uid).

I have two requests:

  1. Please add a null-safe fallback for accessPrivilege.
    After this change, public datasets without an explicit access row for the current user will have no joined DATASET_USER_ACCESS.PRIVILEGE, so record.get(...) can be null. The frontend type expects "READ" | "WRITE" | "NONE", so I think we should return PrivilegeEnum.NONE instead of null.

    Example:

    val accessPrivilege =
    Option(record.get(DATASET_USER_ACCESS.PRIVILEGE, classOf[PrivilegeEnum]))
    .getOrElse(PrivilegeEnum.NONE)

  2. Please add a small spec for the duplicate case:

    • dataset is public
    • dataset is explicitly shared with the current user
    • search with includePublic=true
    • result contains exactly one dataset row, with the current user's privilege

Optional cleanup: the condition rewrite is logically okay, but the PR could be smaller if we keep the existing WHERE logic and only change the JOIN. Also, the new comments have a couple typos and “join is skipped” is slightly misleading because the join remains present but is forced to match no rows when uid == null.

@xuang7 xuang7 added the release/v1.2 back porting to release/v1.2 label Jun 29, 2026
@Mrudhulraj

Copy link
Copy Markdown
Author

I have fixed it and added appropriate comments.
Final fix: 594a697

@codecov-commenter

Copy link
Copy Markdown

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 5 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main 1a58433 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 377 0.23 25,241/41,163/41,163 us 🔴 +21.5% / 🔴 +173.1%
🟢 bs=100 sw=10 sl=64 810 0.495 123,125/137,036/137,036 us 🟢 -14.0% / 🔴 +27.4%
bs=1000 sw=10 sl=64 932 0.569 1,064,666/1,125,263/1,125,263 us ⚪ within ±5% / 🔴 +9.4%
Baseline details

Latest main 1a58433 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 377 tuples/sec 405 tuples/sec 777.62 tuples/sec -6.9% -51.5%
bs=10 sw=10 sl=64 MB/s 0.23 MB/s 0.247 MB/s 0.475 MB/s -6.9% -51.5%
bs=10 sw=10 sl=64 p50 25,241 us 23,715 us 12,612 us +6.4% +100.1%
bs=10 sw=10 sl=64 p95 41,163 us 33,887 us 15,070 us +21.5% +173.1%
bs=10 sw=10 sl=64 p99 41,163 us 33,887 us 18,360 us +21.5% +124.2%
bs=100 sw=10 sl=64 throughput 810 tuples/sec 797 tuples/sec 988.31 tuples/sec +1.6% -18.0%
bs=100 sw=10 sl=64 MB/s 0.495 MB/s 0.486 MB/s 0.603 MB/s +1.9% -17.9%
bs=100 sw=10 sl=64 p50 123,125 us 123,089 us 101,066 us +0.0% +21.8%
bs=100 sw=10 sl=64 p95 137,036 us 159,269 us 107,594 us -14.0% +27.4%
bs=100 sw=10 sl=64 p99 137,036 us 159,269 us 115,830 us -14.0% +18.3%
bs=1000 sw=10 sl=64 throughput 932 tuples/sec 918 tuples/sec 1,019 tuples/sec +1.5% -8.6%
bs=1000 sw=10 sl=64 MB/s 0.569 MB/s 0.56 MB/s 0.622 MB/s +1.6% -8.6%
bs=1000 sw=10 sl=64 p50 1,064,666 us 1,085,862 us 986,982 us -2.0% +7.9%
bs=1000 sw=10 sl=64 p95 1,125,263 us 1,139,586 us 1,028,491 us -1.3% +9.4%
bs=1000 sw=10 sl=64 p99 1,125,263 us 1,139,586 us 1,058,493 us -1.3% +6.3%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,529.80,200,128000,377,0.230,25241.47,41162.51,41162.51
1,100,10,64,20,2468.11,2000,1280000,810,0.495,123124.61,137036.14,137036.14
2,1000,10,64,20,21459.36,20000,12800000,932,0.569,1064666.45,1125262.78,1125262.78

Mrudhulraj and others added 10 commits June 29, 2026 21:44
…n the hub page

fix(dashboard): Remove duplicated rows when dataset shared Publicly in the hub page
…che#6015)

### What changes were proposed in this PR?

`AGENTS.md` documents the single-node and k8s deploy paths but never
tells an agent how to run the local dev stack. This PR:

- adds a "Where Things Live" row routing to `bin/local-dev.sh` — the
single entry point (infra in Docker; backend, frontend, and
agent-service run natively);
- expands the "Develop in a worktree" section to:
  - prefer `bin/local-dev.sh` while developing;
- bounce the stack across worktree switches (`down` in the old worktree,
`up` in the new one), since the native services bind fixed ports and
share one PID/state dir, so only one worktree's stack runs at a time;
- use the non-interactive CLI subcommands — the interactive TUI (`-i`)
is for humans, not agents.
- drops "Local" from the single-node / k8s deploy row — both can deploy
anywhere, not only locally.

Only the wrapper is surfaced, never the internal
`bin/local-dev/main.sh`.

### Any related issues, documentation, discussions?

Closes apache#6014

### How was this PR tested?

Docs-only change; no tests. Verified the linked
`bin/local-dev/README.md` path resolves and that the described
subcommands (`up` / `down` / `status` / `logs` / `-i`) match
`bin/local-dev.sh --help`.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)
…omote to errors (apache#5985)

### What changes were proposed in this PR?

Fix the two remaining Angular extended-diagnostic warnings surfaced by
the frontend build, add a unit test for the affected logic, and promote
both checks to build errors so regressions can't silently creep back in:

- **NG8113** — `CardItemComponent` listed `NzWaveDirective` in its
standalone `imports` array but never used it in the template. Removed
the unused directive import.
- **NG8107** — `computing-unit-selection.component.html` used
`pve.name?.trim()`, but `PveDraft.name` is a non-nullable `string`, so
the optional chaining is redundant. Extracted the predicate into an
`isCreateDisabled(pve)` component method bound via
`[disabled]="isCreateDisabled(pve)"`, and added unit tests covering
empty / whitespace / valid / padded names (the create button stays
disabled until the env name has non-whitespace content).
- **Enforcement** — set `unusedStandaloneImports` (NG8113) and
`optionalChainNotNullable` (NG8107) to `"error"` under
`angularCompilerOptions.extendedDiagnostics.checks`. Placed in
`tsconfig.app.json` (the prod build, `build:ci`) rather than the base
tsconfig, because `extendedDiagnostics` requires `strictTemplates`,
which the unit-test builds deliberately disable; `tsconfig.test.json`
clears the inherited option to avoid `NG4003`.

No runtime behavior change.

### Any related issues, documentation, discussions?

Closes apache#5984

### How was this PR tested?

- `ng test` (the affected spec): **46 passed**, including the 4 new
`isCreateDisabled` cases.
- `ng build`: succeeds with **0** NG8113/NG8107 diagnostics (now
configured as errors) and produces the normal bundle.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rkflowExecutionService (apache#5922)

### What changes were proposed in this PR?

Consolidates execution init-error reporting into a **single site owned
by `WorkflowExecutionService`**, replacing the two-phase split apache#5781
introduced (per @Yicong-Huang's suggestion, tracked in apache#5921).

- `WorkflowExecutionService` now registers its `fatalErrors →
WorkflowErrorEvent` (and state) diff handler as the **first**
construction action. Construction itself does no external work and
cannot throw — it only assigns `workflowSettings`, creates a
`WebsocketInput` (a `PublishSubject`), and registers the handler. All
throwing work lives in `executeWorkflow()`, which runs **after**
`executionService.onNext(...)` publishes the execution, so any failure
there is recorded by `errorHandler` into the metadata store and surfaced
by the handler that `connectToExecution` forwards.
- `WorkflowService.initExecutionService` drops the pre-publish
`errorSubject` fallback: the `executionPublished` gating and
`reportFatalErrorsToSubscribers` are gone, and the catch is simply
`errorHandler(e)`. The now-unused `errorSubject` field and its
`connect()` subscription are removed.
- `WorkflowServiceSpec` is removed — it only tested the deleted
`reportFatalErrorsToSubscribers`; the surfacing behavior is exercised by
the integration/e2e suites.

Net: a single reporting path (the metadata-store diff handler), with no
change to the init-error surfacing apache#5781 added — construction is
provably side-effect-free, so the pre-publish window no longer has a
failure mode.

### Any related issues, documentation, discussions?

Resolves apache#5921 — the follow-up refactor agreed during the apache#5781 review.

### How was this PR tested?

`scalafmtCheckAll` clean. This is a behavior-preserving refactor of
init-error reporting: the single reporting path is the metadata-store
diff handler that already surfaced post-publish errors, and construction
is side-effect-free so no error can escape it. The full compile,
scalafix, and the integration/e2e suites that exercise init-error
surfacing run in CI.

### Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
### What changes were proposed in this PR?

The Codecov patch status was `informational: true`, so it never gated
PRs. This makes it a real requirement:

- set `target: 60%` on `coverage.status.patch.default`;
- add `threshold: 10%` so the patch passes when coverage is at least
`target - threshold` = 50%, softening the quantization on tiny diffs
(e.g. a 2-coverable-line change passes at 1 line / 50%);
- drop the `informational` flag so the patch check can fail.

PRs must now cover at least 50% of the lines they change (60% target,
10% slack). Project status (auto target, 1% threshold), flag
carryforward, and ignore rules are unchanged. The explanatory comment in
`codecov.yml` is updated to match.

### Any related issues, documentation, discussions?

Closes apache#6021

### How was this PR tested?

Config-only change; `codecov.yml` validated as well-formed YAML. Codecov
applies the new policy on the next PR upload.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)
…nt (apache#5949)

<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->

> [Stacked on apache#5947]

This PR consolidates the public and private dataset listings to use the
shared CardItemComponent.
- Replace texera-dataset-card-item with the shared read-only
texera-card-item in the public hub.
- Remove the unused DatasetCardItemComponent and its tests.
- Align the hub list/card toggle with the workflow and private dataset
listings.
- Update the like button styling on the shared card.

This keeps dataset cards consistent across all listings and reduces
duplicated code.

Demo:
<img width="1258" height="400" alt="public-datasets"
src="https://github.com/user-attachments/assets/49a083f1-d2a4-47f5-99a4-d70f90ccd947"
/>

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
Closes apache#5948 

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
All existing tests passed, and the changes were manually tested.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
Generated-by: Claude Opus 4.8
…nent (apache#5566)

⚠️ This PR is stacked on apache#5574. Until that lands, the diff below may
also include PR 5's QA/ranking task changes depending on which base
GitHub is showing. The new code in this PR is the HuggingFaceComponent
(task selector + model browser) under
frontend/src/app/workspace/component/hugging-face/, plus the formly
registration in formly-config.ts and the declaration in app.module.ts.
Once PR apache#5574 merges and this PR is retargeted to main, the diff should
auto-clean to the PR 6a frontend selector changes only.

### What changes were proposed in this PR?

Add `HuggingFaceComponent`, a custom formly field type (`huggingface`)
that provides:
- A task dropdown listing all supported HuggingFace inference tasks
(fetched from the Texera backend's `/huggingface/tasks` endpoint, with a
static fallback list)
- A paginated model list with client-side search, fetched from the
Texera backend's `/huggingface/models` endpoint (which proxies
HuggingFace Hub)
- Per-task field state preservation — when switching tasks, previously
entered values (modelId, promptColumn, etc.) are saved and restored

This PR registers the component in `formly-config.ts` and declares it in
`AppModule`. The component is not yet wired into the HuggingFace
operator's property editor; the `jsonSchemaMapIntercept` mapping that
routes the `modelId` field to this component is added in the follow-up
property-editor PR (PR 7).

### Any related issues, documentation, discussions?

- Tracking issue: apache#5314
- Closes: apache#5314
- Stacked on: PR tracked in issue apache#5292
- Parent issue: apache#5041

### How was this PR tested?

7 unit tests added in `hugging-face.component.spec.ts` covering:
- Static task list is non-empty and contains expected tasks
(text-generation, image tasks, audio tasks, QA/ranking tasks)
- Task tags are unique
- Cache invalidation does not throw

Run with `ng test`.

### Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7

---------

Signed-off-by: Elliot Lin <36275109+ELin2025@users.noreply.github.com>
Co-authored-by: Elliot <36275109+Falcons-Royale@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
### What changes were proposed in this PR?

Add `HuggingFaceAudioUploadComponent`, a custom formly field type
(`huggingface-audio-upload`) that provides:
- An audio file picker that uploads to the Texera backend's
`/huggingface/upload-audio` endpoint for server-side storage
- Authenticated audio preview/playback — fetches server-stored audio via
`HttpClient` (which carries the JWT) and creates a blob URL for the
`<audio>` element, avoiding 401s on workflow reload
- Local audio preview using a browser Object URL while upload is in
progress
- Stale response guards — discards upload results and preview fetches if
the user clears the field while a request is in flight
- Concurrent upload protection — disables the file input and Clear
button during upload
- Storage of the returned server path in the formly form control
- A guidance message explaining why audio is uploaded to backend storage
rather than embedded in the workflow JSON

This PR also registers `HuggingFaceComponent` and
`HuggingFaceAudioUploadComponent` in `formly-config.ts` and declares
them in `AppModule`. The `jsonSchemaMapIntercept` mapping that routes
the `audioInput` field to this component is added in the follow-up
property-editor PR (PR 7).

### Any related issues, documentation, discussions?

- Tracking issue: apache#5314
- Closes: apache#5314 
- Stacked on: apache#5566
- Parent issue: apache#5041

### How was this PR tested?

38 unit tests in `hugging-face-audio-upload.component.spec.ts` covering:
- `ngOnInit` — filename extraction from server paths, data URLs,
empty/whitespace values
- `previewSrc` — returns data URLs as-is, empty for server paths (loaded
async), empty for blank values
- `onFileSelected` — successful upload, non-audio rejection, concurrent
upload guard, no-file guard, upload-in-progress state, error handling,
filename fallback, model updates, Content-Type/URL encoding
- `loadServerAudioPreview` — successful blob fetch, fetch failure error
message, stale response discard on both success and error, no-fetch for
data URLs, URL encoding
- `clearAudio` — full state reset, error preservation, model clearing,
dirty/touched marking
- Stale upload guards — discard success/error when cleared during flight
- `ngOnDestroy` — blob URL revocation
- `getDisplayName` — forward/backslash paths, trailing separator, flat
filename

Run with `ng test`.

### Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.6

---------

Co-authored-by: Elliot <36275109+Falcons-Royale@users.noreply.github.com>
Co-authored-by: Anish Shivamurthy <anish@uci.edu>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e#6025)

### What changes were proposed in this PR?

The usage banner in `bin/local-dev/main.sh` claimed Docker infra is
**not**
managed by the script and that you must bring it up yourself before
`up`. That
contradicts the actual code: `cmd_up` calls `infra_up` (which runs
`docker
compose up -d` against project `texera-local-dev`) and `cmd_down` calls
`infra_down` for any docker targets.

```
Before:  comment says "infra is NOT managed here -- bring those up yourself"
After:   comment says "infra IS managed: `up` brings it up, `down` tears it down"
```

Comment-only change; no behavior change.

### Any related issues, documentation, discussions?

Related to apache#5960 (introduced `bin/local-dev` and this banner) and apache#6007
(wired
`infra_up` into `up`/`auto`, which is what made the banner stale). No
dedicated
issue — documentation/comment-only fix (issue-first is excepted for
docs). The
sibling `bin/local-dev/README.md` already documents the correct behavior
("infra in docker, JVM + frontend native"); this just realigns the
script's
own banner.

### How was this PR tested?

No tests — comment-only change. Verified the described behavior by
reading
`cmd_up` → `infra_up` (`docker compose up -d`, project
`texera-local-dev`) and
`cmd_down` → `infra_down`, and by running `bin/local-dev.sh up`, which
brought
up postgres/minio/lakefs/lakekeeper/litellm automatically (14/14
services
healthy) with no manual `docker compose` step.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)
…at (apache#6027)

### What changes were proposed in this PR?

Two small, agent/script-friendly additions to `bin/local-dev.sh` — no
change to the human TTY experience.

**1. `--json` machine-readable status.** `status --json` prints one JSON
object on stdout (no colours, no table) and exits `0` iff every service
is running, else `1`:

```json
{"branch":"...","sha":"...","running":14,"total":14,"services":[
  {"service":"texera-web","port":8080,"type":"jvm","pid":25823,"state":"running"}, ...]}
```

`up` and `down` also accept `--json`: human progress is routed to
**stderr** (unbuffered) and the final status JSON goes to **stdout**
(via a saved fd), so a caller can `up --json >state.json 2>progress.log`
and parse stdout directly.

**2. Non-TTY build heartbeat.** In non-TTY mode the spinner can't render
in place, so a long silent step (`sbt dist`, output redirected to a log)
used to print one line then go quiet for 25s+ — indistinguishable from
"stuck" to a non-interactive caller. `tui_spinner` now emits `… still
running (Ns)` every `TUI_HEARTBEAT_SECS` (default 15), polling at 1s so
it still returns within ~1s of the job finishing (no trailing latency).

### Any related issues, documentation, discussions?

Closes apache#6026. Usage banner (`--help`) updated to document `--json` on
`status`/`up`/`down`.

### How was this PR tested?

- `bash bin/local-dev/tests/test_local_dev_sh.sh` → **19 passed, 0
failed** (added 6: JSON shape/consistency, health-based exit code,
unknown-flag negative case, `--help` coverage, heartbeat regression
guard, up/down `--json` wiring).
- `python -m pytest bin/local-dev/tests/` → **38 passed** (no
regression).
- Dogfooded end to end: `down --json` → stdout pure JSON (`running:0`,
14 `stopped`), exit 0; `up --json` → single-line JSON `running:14/14` on
stdout, and stderr showed the new heartbeat across a ~60s build (`…
still running (15s/30s/45s/60s)`), 14/14 healthy.
- Verified stdout/stderr separation: `up --json >state.json
2>progress.log` yields parseable JSON in `state.json`.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)
@Mrudhulraj Mrudhulraj force-pushed the fix/dataset-fetch-query-fix branch from a7bb653 to f65e408 Compare June 30, 2026 04:46
@github-actions github-actions Bot added frontend Changes related to the frontend GUI ci changes related to CI docs Changes related to documentations infra labels Jun 30, 2026
@github-actions github-actions Bot removed frontend Changes related to the frontend GUI ci changes related to CI docs Changes related to documentations infra labels Jun 30, 2026
@carloea2

Copy link
Copy Markdown
Contributor

I did not see testing of this PR, could you please add it?

@Mrudhulraj

Copy link
Copy Markdown
Author

@carloea2 can you guide me here? I just did some manual testing and database checks here.

@carloea2

Copy link
Copy Markdown
Contributor

Manual checks are useful, but this needs an automated DB-backed regression spec.

Take a look at:
amber/src/test/scala/org/apache/texera/web/resource/dashboard/file/WorkflowResourceSpec.scala

That spec already uses MockTexeraDB from:
common/dao/src/test/scala/org/apache/texera/dao/MockTexeraDB.scala

Please add a regression test under amber/src/test/scala/org/apache/texera/web/resource/dashboard/, for example DatasetSearchQueryBuilderSpec.scala, using MockTexeraDB.

  • creates a public dataset owned by one user
  • adds a DATASET_USER_ACCESS row for the searching user
  • calls dashboardResource.searchAllResourcesCall(..., SearchQueryParams(resourceType = "dataset"), includePublic = true)
  • asserts the result contains exactly one dataset row for that did
  • asserts accessPrivilege is the searching user’s privilege

@Yicong-Huang

Copy link
Copy Markdown
Contributor

Thanks @Mrudhulraj and @carloea2. Let's make sure we have test coverage on the code change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine fix release/v1.2 back porting to release/v1.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants